Refactor JIT kernel CI to use run_suite.py registration system by merrymercy · Pull Request #21239 · sgl-project/sglang

merrymercy · 2026-03-23T23:32:17Z

Summary

Migrate pr-test-jit-kernel.yml from raw pytest/shell invocation to the centralized run_suite.py + register_cuda_ci registration system
Introduce three new suites: stage-b-kernel-unit-1-gpu-large, stage-b-kernel-benchmark-1-gpu-large, nightly-kernel-1-gpu
Add register_cuda_ci calls to all 35 test files and 23 benchmark files under python/sglang/jit_kernel/
Extend run_suite.py glob to discover JIT kernel tests and benchmarks alongside test/registered/
Fix missing __main__ guards in test_nvfp4_*.py files, fix test_custom_all_reduce.py to dispatch between torchrun worker and pytest

Test plan

Verify jit-kernel-unit-test job discovers and runs all unit tests via run_suite.py --hw cuda --suite stage-b-kernel-unit-1-gpu-large
Verify jit-kernel-benchmark-test job discovers and runs all benchmarks via run_suite.py --hw cuda --suite stage-b-kernel-benchmark-1-gpu-large
Verify jit-kernel-unit-test-nightly job runs full tests via run_suite.py --hw cuda --suite nightly-kernel-1-gpu --nightly
Verify disabled tests (test_custom_all_reduce, bench_custom_all_reduce, bench_norm_impls) are correctly skipped

Made with Cursor

Migrate pr-test-jit-kernel.yml from raw pytest/shell invocation to the centralized run_suite.py + register_cuda_ci system, introducing three new suites: - stage-b-kernel-unit-1-gpu-large (per-commit unit tests) - stage-b-kernel-benchmark-1-gpu-large (per-commit benchmarks) - nightly-kernel-1-gpu (nightly full tests) Changes: - Add register_cuda_ci calls to all 35 test files and 23 benchmark files - Extend run_suite.py glob to discover jit_kernel tests/benchmarks - Register new suites in PER_COMMIT_SUITES and NIGHTLY_SUITES - Fix missing __main__ guards in test_nvfp4_*.py files - Fix test_custom_all_reduce.py __main__ to handle both torchrun worker and direct pytest invocation - Disable multi-GPU and self-skipping tests/benchmarks with reasons Made-with: Cursor

gemini-code-assist · 2026-03-23T23:32:21Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

merrymercy · 2026-03-23T23:50:13Z

/tag-and-rerun-ci

… logs - Remove jit-kernel-unit-test-nightly from pr-test-jit-kernel.yml; add nightly-test-kernel-1-gpu-h100 to nightly-test-nvidia.yml with matching env (SGLANG_JIT_KERNEL_RUN_FULL_TESTS, SGLANG_JIT_DEEPGEMM_FAST_WARMUP, SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN) and workflow_dispatch filter. - Refresh register_cuda_ci est_time values using observed wall times from PR #21239 run 23465359984: jit-kernel-unit-test (68277933098) and jit-kernel-benchmark-test (68277933115). Per-commit times use ~1.15x elapsed; nightly uses 4x stage-b (min 120, cap 1200) except FA4 (120/900; FA4 elapsed is a skip on H100). Made-with: Cursor

…roject#21239)

merrymercy added 2 commits March 23, 2026 14:50

Migrate jit_kernel to registry system

ef2c984

merrymercy requested review from BBuf, DarkSharpness, Fridge003, HydraQYH, Kangyan-Zhou, bingxche, celve, ispobock and yuan-luo as code owners March 23, 2026 23:32

github-actions bot added quant LLM Quantization lora hicache Hierarchical Caching for SGLang blackwell SM100/SM120 jit-kernel labels Mar 23, 2026

github-actions bot added the run-ci label Mar 23, 2026

merrymercy merged commit 260abe1 into main Mar 24, 2026
62 of 129 checks passed

merrymercy deleted the lianmin/final-ci-cleanup branch March 24, 2026 04:17

merrymercy mentioned this pull request Mar 24, 2026

docs: update add-jit-kernel skill for run_suite CI registration #21264

Merged

1 task

adityavaid pushed a commit to adityavaid/sglang that referenced this pull request Mar 24, 2026

Refactor JIT kernel CI to use run_suite.py registration system (sgl-p…

59613cd

…roject#21239)

adityavaid pushed a commit to adityavaid/sglang that referenced this pull request Mar 24, 2026

Refactor JIT kernel CI to use run_suite.py registration system (sgl-p…

4504715

…roject#21239)

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

Refactor JIT kernel CI to use run_suite.py registration system (sgl-p…

165f7db

…roject#21239)

johnnycxm pushed a commit to johnnycxm/sglang that referenced this pull request Mar 25, 2026

Refactor JIT kernel CI to use run_suite.py registration system (sgl-p…

d1f2340

…roject#21239)

johnnycxm pushed a commit to johnnycxm/sglang that referenced this pull request Mar 25, 2026

Refactor JIT kernel CI to use run_suite.py registration system (sgl-p…

6fdd887

…roject#21239)

hnyls2002 mentioned this pull request Mar 27, 2026

[jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT #19103

Merged

5 tasks

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Refactor JIT kernel CI to use run_suite.py registration system (sgl-p…

2a71544

…roject#21239)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor JIT kernel CI to use run_suite.py registration system#21239

Refactor JIT kernel CI to use run_suite.py registration system#21239
merrymercy merged 3 commits intomainfrom
lianmin/final-ci-cleanup

merrymercy commented Mar 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

merrymercy commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

merrymercy commented Mar 23, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

merrymercy commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant